Randomized benchmarking of single and multi-qubit control in liquid-state NMR 

quantum information processing. 
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Being able to quantify the level of coherent control in a proposed device implementing a quantum 
information processor (QIP) is an important task for both comparing different devices and assessing 
a device's prospects with regards to achieving fault-tolerant quantum control. We implement in a 
liquid-state nuclear magnetic resonance QIP the randomized benchmarking protocol presented by 
Knill et al (PRA 77: 012307 (2008)). We report an error per randomized § pulse of 1.3±0.1 x 10~ 4 
with a single qubit QIP and show an experimentally relevant error model where the randomized 
benchmarking gives a signature fidelity decay which is not possible to interpret as a single error 
per gate. We explore and experimentally investigate multi-qubit extensions of this protocol and 
report an average error rate for one and two qubit gates of 4.7 ± 0.3 x 10~ 3 for a three qubit QIP. 
We estimate that these error rates are still not decoherence limited and thus can be improved with 
modifications to the control hardware and software. 

PACS numbers: 03.67.Ac, 03.67.Lx, 



I. INTRODUCTION 

Quantum information processing devices have the po- 
tential to revolutionize our understanding of computa- 
tional complexity and solve certain problems exponen- 
tially faster than current classical algorithms. In order 
to achieve these goals the ability to coherently control a 
large number of two level quantum systems (qubits) will 
have to be demonstrated. An important issue in this re- 
search path is to be able to quantify the level of control 
demonstrated. A clear, systematic and standardized al- 
gorithm is needed to be able to report the relevant level 
of control achieved in a given system. Such a protocol 
would be useful in a number of ways: it should provide 
a fair and transparent way to compare different devices 
and technologies; it should provide a way to quantify en- 
gineering improvements to the same device and it should 
provide a rough measure of the device's prospects with 
regards to fault-tolerant computation [![. 

Full characterization of any quantum process, and 
hence calculation of the fidelity of control, is possible 
through a procedure known as quantum process tomog- 
raphy (QPT) 0. However, there are a number of caveats 
with this approach. It is difficult to analyse and to recon- 
struct a completely positive map from the results when 
there are errors in the preparation and readout steps 
and/or there is noise in the measurments @. Indeed to 
quantify the error in a certain gate with QPT, readout 
and preparation pulses with a lower error level than the 
gate being measured are required. QPT gives full char- 
acterization of a particular quantum gate in a particular 
setting. Although this is useful information, it does not 
necessarily tell us how another gate will perform, or even 
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how the same gate will perform as part of a larger compu- 
tation. Finally, full QPT requires an exponential number 
of experiments, making it experimentally prohibitive for 
QIP's larger than a few qubits. 

Ultimately, full knowledge of a quantum operation is 
often not needed to provide an answer to the above prob- 
lems. Randomization has been proposed as a useful tech- 
nique in revealing a smaller number of relevant coarse- 
grained parameters of the channel p[. By twirling a chan- 
nel with random, Haar distributed, unitaries the channel 
is reduced to a depolarizing channel with a single param- 
eter to describe the strength of the noise and thus the 
average gate fidelity. This approach benefits from the 
concentration of measure in large Hilbcrt spaces whereby 
the average fidelity can be estimated with only a few ex- 
periments Q. This technique can be generalized to a 
sequence of random unitaries and a fidelity decay is mea- 
sured as function of increasing number of gates. The rate 
of fidelity decay can then be measured and related to the 
average gate fidelity. 

Generating fully Haar-random unitaries for this pro- 
tocol is inefficient as it requires an exponential num- 
ber of continuous parameters and thus an exponential 
amount of elementary gates to describe and create and 
Haar-random unitary gate. Fortunately, previous work 
has shown that the Clifford group is a unitary 2-design, 
meaning it is sufficient to sample from the n qubit Clif- 
ford group to depolarize a n qubit channel and to esti- 
mate its average fidelity 0, 0] ■ Efficient methods exist for 
generating random Clifford gates from elementary 1 and 
2 qubit gates [H, Q and it is even possible to reduce the 
number of gates required by using pseudo-random Clif- 
ford gates from either a prescribed algorithm Q , or sim- 
ply multiplying together randomly chosen 1 and 2 qubit 
gates @ . Randomized benchmarking of single qubit Clif- 
ford group gates was formalized in a protocol presented 
by Knill et al. [13], where the fidelity decay under a sc- 
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qucncc of random Clifford group operations is measured 
and the average gate fidelity can then be calculated. 

Liquid state NMR offers a clean system with high fi- 
delity control built on decades of engineering experience 
in NMR spectroscopy. Utilizing this control, liquid-state 
NMR QIP's have established many demonstrations of 
quantum algorithms and simulations [ill , [l2| and are an 
ideal tcstbed for exploring ideas about quantum control 
for quantum information processing purposes [HI. Here 
we present results of applying these randomized bench- 
marking protocols to both single and multiple qubit gates 
in a liquid state NMR QIP. In these experiments we are 
able to quantify the control achieved by both standard 
pulse techniques on a single qubit and more advanced 
pulse shaping approaches from optimal control theory in 
the multi-qubit setting. While our single qubit exper- 
iments followed Ref. [lj|, there are potentially many 
generalizations of the protocol to more than one qubit 
and we suggest two such protocols. Finally, it is difficult 
to obtain analytical results in the case of benchmarking 
pulse dependent errors. Indeed, we find and analyze an 
experimentally relevant error model where randomized 
benchmarking fails to reveal a single average error per 
gate. This serves to highlight the difficulty in devising 
universal efficient benchmarking protocols. 



II. PROTOCOLS 

The protocols are a form of a generalized motion rever- 
sal applied to efficient gate fidelity estimation The 
basic steps are to apply sequences of random unitary 
gates and then measure the average fidelity decay as a 
function of the number of gates. With the assumption 
that the errors are independent of the gate performed 
and that the gates are chosen uniformly according to the 
invariant Haar measure, the series of random gates and 
averaging over different gate sequences will effectively de- 
polarize the noise. That is, the state after a self-inverting 
sequence of n gates is given by: 
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where \p(i)) is density matrix p(i) after the i'th operation 
represented as a vector in Liouville space, Ui = U* <g> Ui 
is the supcroperator representation of the unitary gate 
Ui and A is the noise superoperator [TJ]. Under the av- 
eraging A ave becomes a depolarizing noise Q , 



Aave(p) =PAP+ (1 --PA)7j> 



(2) 



where D is the dimensionality of the system and the de- 
polarizing parameter is related to the original noise op- 
erator by 
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Therefore, we expect the average fidelity of the output 
state with respect to an arbitrary input state after n 
gates to decay exponentially to a saturation level which 
depends on the dimension of the Hilbert space: 



Fn = Pi 



Tr(p(0) 2 ) - 1 



1 



where we have defined 



F; 



D 



(P(i)\pm = 7 rTr[p( i )tp(0)] 



D 



(4) 



(5) 



Measuring the decay of the average fidelity thus gives 
us a concrete information about the strength of the noise, 
without giving the details of the action of the noise. From 
an error correction and fault-tolerance perspective, the 
schemes are usually developed regardless of the specifics 
of the action of the noise and the strength of the noise 
is the most relevant piece of information. And from the 
strength of the noise the average gate fidelity can be cal- 
culated: 



F g (A) = PA + 
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Because the gate fidelity corresponds to a second order 
polynomial in the gate and its complex conjugate (also 
known as a (2, 2) polynomial), the average gate fidelity 
over the Haar measure can be evaluated using a unitary 
2-design so that the continuous integral over the unitaries 
can be replaced by a sum over, for example, the finite 
Clifford group C [EQ, L 
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Then the sequence of random unitaries becomes a se- 
quence of random Clifford group gates. The use of Clif- 
ford gates for benchmarking has a number of justifica- 
tions. Clifford group operations are of paramount impor- 
tance in most fault tolerant constructions based on stabi- 
lizer codes. The Clifford group operations are the main 
computational elements and universality is granted via 
state preparation of so-called " mag ic states" , e.g. states 
of the form cos § |0) + sin -| 1 1) [15| ■ The performance of 
many computational steps can be bootstrapped through 
the use of higher fidelity Clifford group operations, e.g. 
several noisy magic states can be purified with ideal Clif- 
ford gates to create one magic state with a lower error 
rate [la]. Morevover, the state's evolution under Clifford 
group operations can be efficiently tracked classically al- 
lowing an efficient construction of a recovery gate and/or 
prediction of the ideal final state Q . 

The protocols are designed to extract the average gate 
fidelity which under reasonable assumptions about the 
error model should be the computationally relevant quan- 
tity. Algorithms using only Clifford operations, for ex- 
ample many fault-tolerant constructions, can be Pauli 
randomized at every step (whether it occurs inherently 



as part of a teleportation [16| or is explicitly put in) and 
so the quantity measured by randomized benchmarking 
should be close to the error rate experienced in an al- 
gorithm. It is certainly true that with many qubits the 
Hilbert space is large enough to hide a worst case fi- 
delity of while the average fidelity is very high. And 
so, it is possible that some very large, highly correlated 
and specially designed error will be undetected by this 
benchmarking procedure. However, this would seem to 
require a contrived unphysical error model. Furthermore, 
for one and two qubit gates the Hilbert space is too small 
for the worst case and average fidelity to be significantly 
different. Finally because it is too difficult to show fault- 
tolerance for an arbitrary distribution of errors, proofs 
[13] and simulations [l8| of fault-tolerance schemes rely 
on a stochastic distribution of error locations or a depo- 
larizing error model respectively, for which the average 
fidelity should be the relevant quantity to measure. 

A. Single Qubit 

In the case of the single qubit benchmarking we fol- 
lowed exactly the implementation of Knill et al [ic| ]. 
For depolarizing one qubit noise, the single qubit Clif- 
ford operations are isomorphic to the 48 operations 
parametrized as 

C = SV 

= e ±»f Q e ±if P j ( 8 ) 

where Q G {X, Y, Z}, P G {1, X, Y, Z}, that is, a tt pulse 
(or Pauli operation) followed by a ^ pulse (or a symplec- 
tic operation). The symplectic operations are deemed 
the "computationally relevant" operations that advance 
the computation while the Pauli operations serve only to 
redefine the Pauli frame. 

The circuit implemented is shown in Figure [T] To per- 
form an approximate averaging, a series of 192 compu- 
tational gates was chosen at random and truncated at a 
series of different lengths. Random Pauli operations were 
then inserted between each computational gate. The ini- 
tial state was chosen to be the thermal state in NMR: 
|l + eZ (where e w 10 -5 ). The identity component is 
unobservable in NMR and can be considered a large error 
in the preparation or measurement which is normalized 
out by the protocol. The state was tracked through the 
computational gates and the recovery gate R was cho- 
sen to return the state to cither +Z or —Z with equal 
probability. The state was then readout with a 90 degree 
readout pulse and the fidelity measured by comparing 
the integral of the signal to a reference spectrum. For 
each truncation, the Pauli operations were randomized 
8 times. Each point was further averaged over four dif- 
ferent computational gate sequences and the averaged 
fidelity from the 32 experiments for each truncation was 
used in the fitting. 

One technical point to note is that rotations about 
the Z axis are implemented through an abstract frame 
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FIG. 1: Quantum circuit implementing single qubit bench- 
marking. A fiducial state is prepared and a sequence of com- 
putational gates G is applied. The recovery gate R is chosen 
to return system in a known final state. The Pauli gates P 
interleaved with the computational gates induce a Pauli ran- 
domization. 

change (changing the phase of subsequent pulses and po- 
tentially the observation) and take no time. However, 
for consistency, a delay equivalent to the ^ or tt pulse 
time was executed for those gates. This is the procedure 
followed in [icj |. However, performing the Z rotation in 
this manner (as opposed to physically implementing the 
gate) is not as effective at depolarizing the noise because 
in commuting the Z rotation through the pulse sequence 
it is also assumed the Z rotation commutes with the noise 
operation. In situations where the noise is dominated by 
dephasing this may be appropriate but for a general case 
this is not true. 



B. Multiple Qubits 

In the case of more than one qubit, it is difficult to 
prescribe the correct gate set for determining an error 
per gate. The gates should depolarize the noise but at 
the same time the error per gate should be meaningful 
in relation to the fault-tolerant thresholds. It would be 
ideal to quantify the error per gate for one and two qubit 
gates and also storage errors for wait steps. However, 
it is difficult to isolate the errors for only these gates 
if the error model does not satisfy the independent error 
model - that each gates errors are described by a quantum 
operations acting only on qubits which the gate affects. 
In realistic situations it is most likely that applying a 
gate to qubit a could induce an error on qubit b. 

One possibility is to choose a generating gate set 
consisting of single qubit Clifford generators (say the 
Hadamard and phase gates) and controlled NOT's be- 
tween pairs of qubits. This will generate the multi-qubit 
Clifford group and indeed after only a small number of 
gates will approximate a 2-design necessary for depolariz- 
ing the noise 0]. The multi-qubit protocol then becomes: 

1. Choose a series of lengths of computational gates 
to measure the fidelity decay at. The number of 
random gates necessary to achieve depolarization 
of the noise depends on the number of qubits and 
may be large. Thus we expect only the asymptotic 
error rate to be meaningful. 

2. For each truncation length choose n g random se- 
quences of computational gates from the generating 
set of the full n-qubit Clifford group. 

3. Determine a recovery sequence which will return 
the state to one with a known definite output upon 
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measurement in the absence of error. This can ei- 
ther undo the entire sequence to return to the in- 
put state or ensure that one stabilizer has a certain 
measurement outcome as suggested in Ref. fiol ]. 
Because the Clifford group operations can be effi- 
ciently tracked this is possible to do efficiently on 
a classical computer and should have no more than 
0(n 2 / log n) gates Q. 

4. Apply some parallclization routine to the random 
sequence of gates to ensure that the number of wait 
steps does not grow with the size of the computer. 
This parallelization step allows a fair comparison 
between different size QIP, say a 5 and a 50 qubit 
computer. The error per time step may be larger 
in the 50 qubit computer but many more gates are 
possible in each timestep. 

5. Measure the fidelity decay as in the single qubit 
case. An exponential fit to the fidelity decay will 
reveal the average error per one and two qubit gate. 
It is possible that the average error could mask a 
distribution of error rates such that for example 
all single qubit gates are perfect but the two qubit 
gates are much worse. However, more detailed, but 
still coarse grained information is available by doing 
more experiments (see Sec. ITV|) . 

Numerical simulations have confirmed that this pro- 
tocol will return the correct asymptotic ( beyond w 30 
gates for the 3 qubit case) error rate for a variety of error 
models such as dephasing and pulse dependent unitary 
errors. For the later, it should be mentioned that we 
made the assumption that the errors were of the same 
strength, hence numerically verifying the conjecture in 
Ref. [f| for this case. Not surprisingly, larger amounts 
of randomization are required compared with the single 
qubit protocol. 

III. EXPERIMENT 

The experiments were performed in liquid state NMR 
on a 700MHz Bruker Avance spectrometer using a TCI 
cryogenic probe. The cryo-probe provides enhanced sen- 
sitivity and associated improved signal-to-noise ratio but 
the high quality factor of the probe resonant circuit leads 
to phase-transient and radiation damping effects. 

A. Single Qubit 

The proton spins of unlabeled chloroform were chosen 
as the single qubits. A sample was made from a 0.3% 
aqueous solution of unlabeled chloroform dissolved in d6- 
acetone. The sample was not vaccuum-pumped to avoid 
unnecessarily long T\ relaxation times. The T\ was mea- 
sured to be 7 seconds through inversion recovery and the 
T2 to be 4.5 seconds using a standard CPMG sequence. 



The unrcfocussed T 2 * was 0.45 seconds calculated from 
the spectral linewidth of the NMR signal. 

To address the amplitude and phase transient issues 
with the high Q cryoprobe, 24//s gaussian shaped ^ 
pulses were used, which avoid these unwanted effects due 
to their more slowly varying amplitude profile. Since 
the largest part of the errors are expected to be due to 
pulse miscalibration, amplifier drift and r.f. inhomogenc- 
ity, composite pulses, robust to r.f. field variation were 
also tested. The BB1 family of pulses from Wimperis et 
al. [l9[ are robust to pulse length (calibration) errors e 
up to order e 6 and are universally compensating in that 
they are robust unitary operations rather than robust for 
a particular state to state transformation. Their useful- 
ness in experimental QIP has been previously reported 
[20I ] . The pulses consist of a compensating block followed 
by the desired pulse so that a rotation by an angle 9 about 
the x axis can be replaced by, 

R x (9) = (18O) 0i (360) fc (18O) 0i R x (9) . (9) 

Where, <f>i and 4>2 depend on the pulse flip angle: 

4>i = ^2 = arccos (j^J ■ ( 10 ) 

The location of the compensating block is not impor- 
tant and it can be placed before or after the pulse. The 
pulse can even be symmeterized by placing the compen- 
sating block between two halves of the pulse [2l[ . 

The results of the single qubit benchmarking with BB1 
composite pulses are shown in Figure [5J It is clear that 
the pulse fidelity is low and furthermore that the curve 
does not fit a single exponential decay well. However, 
these results can be explained by the r.f. field strength 
variation across the sample. This r.f. inhomogeniety 
is particularly bad in cryogenic probes [22| . Indeed, by 
measuring the r.f. inhomogenity profile and simulating 
the experiment across that variation we were able to re- 
produce both quantitatively and qualitatively the results 
showing we understand well the error model. The result 
can be interpreted intuitively in that we expect spins 
which sec an r.f. field very different to the ideal field to 
very quickly end up at some random point on the Bloch 
sphere whereas those close to the ideal field strength will 
closely track the ideal evolution for many gates. Thus we 
expect the fidelity to initially decay quickly (with large 
fluctuations) as the spins at the edge of the r.f. profile 
are depolarized and then for the fidelity to level off and 
decay much more slowly. This intuitive picture is con- 
firmed in a more detailed analysis in Appendix [X] It is 
also interesting to note that with this pulse-dependent 
coherent error model, it is impossible to average the fi- 
delity decay to a single exponential; it is always a sum of 
exponentials with different decay rates. This error model 
is not restricted to ensemble effects but would also ap- 
ply in the case were a parameter (say a laser power in 
an ion trap) slowly varies so that it is constant for the 
time of one experiment but fluctuates from experiment 
to experiment. 
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FIG. 2: (Color online) Experimental (□) fidelity as a func- 
tion of number of randomized gates for a single qubit using 
BB1 composite pulses plotted on a semi- log plot. The fidelity 
decay is clearly non-exponential indicating incoherent pulse 
dependent errors [23l |. This effect is caused by the large dis- 
tribution of r.f. field strengths across the sample shown in 
the inset. Also shown are the results from simulations of the 
pulse sequence (V) averaged over the measured r.f. profile. 
The simulations match the experimental results both qualita- 
tively and quantitatively. 



of 5 x 10~ 5 . This represents a lower bound on the ex- 
pected error rate which we should be able to reach with 
hardware and software improvements. If the T 2 * rather 
than the Ti is used in the decoherence model, the esti- 
mated error per gate climbs to 4 x 10~ 4 . The randomized 
gate sequence will somewhat refocus the static field inho- 
mogenities contributing to T 2 *, but they are not explic- 
itly refocussed. The remaining impediments of incoher- 
ence across the ensemble members and the fluctuations 
in power from the amplifier could be overcome with even 
more robust and compensated pulses, although there is 
a tradeoff between more highly compensated pulses and 
the increased losses due to instrinsic decoherence because 
of the longer pulse times. 



In NMR, the issues arising from r.f. inhomogencity 
can be largely eliminated by running a r.f. selection se- 
quence. This is a sequence of pulses and gradients that 
leaves polarization on only a subset of the ensemble of 
processors that experience an r.f. field within a certain 
range, say ±2% of the ideal field strength [24|. For cali- 
bration purposes and again to avoid the sharp transitions 
of hard pulses we developed a numerically optimized con- 
trol pulse which implemented the r.f. selection. The 
pulse was designed to rotate spins outside the ±2% range 
of desired powers to the x — y plane while leaving the cal- 
ibrated spins along the z-axis. The unwanted spins were 
then dephased using magnetic field gradient techniques. 
This dramatically improves the results and gives a sin- 
gle exponential decay which we fit to give an error per 
randomized computation gate of 1.3 ± 0.1 x 10~ 4 (see 
Figure [3]). A drawback of the r.f. selection sequence is 
that small fluctuations in the pulse power from the am- 
plifier or changes in the resonant circuit give large (up to 
5%) changes in the output signal. These were normalized 
through a stroboscopic observation of the signal after r.f. 
selection for each experiment. 

An estimate of the expected error rate due to intrin- 
sic decoherence can be made from the measured T\ and 
T2 values. The combined time for a randomized com- 
putational gate using BB1 composite pulses is 516. 8^s 
(including delays between pulses to avoid overheating). 
A map consisting of purely T\ and T2 decoherence acting 
for this time would imply an error per randomized gate 
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FIG. 3: (Color online) Semi-log plot of the average fidelity as a 
function of the number of randomized gates for a single qubit 
using BB1 composite pulses after a r.f. selection sequence. 
The error bars (68% confidence) indicate the uncertainty from 
randomization (i.e. different computational sequences and 
Pauli randomizations give different fidelities due to coherent 
or biased errors). The uncertainty in each measurement due 
to signal to noise and fluctuations in the amount of signal 
from the r.f. selection sequence is less than 0.5%. The fidelity 
decay is a good fit to a single exponential shown in red (dashed 
line) with 68% confidence fits and reveals an error per gate of 
1.3 ±0.1 x 10" 4 . 

For comparison purposes, we also tested other pulse 
types with the same protocol. Using only simple un- 
compensated gaussian pulses we obtain an error rate of 
2.1 ± 0.1 x 10 -4 and using GRAPE numerically opti- 
mized pulses (2f|, an error rate ofl.8±0.2xl0~ 4 . The 
GRAPE pulses were numerically optimized to 99.999% 
fidelity (Hilbert-Schmidt (HS) norm) over a range of r.f. 
powers ±3% from the ideal power. They were 100/Lts in 
length and discretized at 1/xs. It is somewhat surpris- 
ing that the numerically optimized pulses cannot match 
the performance of the BB1 pulses. However, the BB1 
pulses are well suited to compensating for systematic de- 
viations from the ideal pulse shape which manifest them- 
selves as calibration errors. Numerically optimized pulses 
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are somewhat robust to noise in the pulse generation: be- 
cause the controls are at a local maximum of fidelity, any 
deviation gives no change in the fidelity to first order. 
However, numerically optimized pulses are still more sen- 
sitive to other imperfections in the implementation. For 
example, the optimization and robustness assumes the 
control fields are constant at each time step in the dis- 
critized pulse. In the experiment, finite bandwidth effects 
and noise prevent exact implementation of this and lead 
to a loss of fidelity. 



B. Multiple Qubits 

A three qubit molecule was made from a sample of se- 
lectively labelled 13 C tris(trimethylsilyl)silane-acetylene 
dissolved in deuterated chloroform [23|]. The structure 
and a table of the natural Hamiltonian parameters is 
shown in Fig. [5] 
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FIG. 4: (Color online) Structure of tris(trimethylsilyl)silane- 
acetylene and a table of natural Hamiltonian parameters (Hz) 
obtained from spectral fitting. The diagonal elements give the 
chemical shifts with respect to the transmitter frequencies 
while the off-diagonal elements give the J-couplings. Ti's and 
T2 , s (seconds) are measured from standard inversion recovery 
and CPMG echo sequences respectively. Ci and Ci were iso- 
topically labelled with 13 C and rest of the molecule contained 
natural abundances and was ignored for the purposes of this 
experiment. 

Control was achieved through the GRAPE optimal 
control technique [25|. The pulses were optimized to 
above 99.95% HS fidelity over a range of r.f. powers 
±3% from the ideal power. The pulses were discretized 
at 2 /is as a balance between smoothness and spectrom- 
eter memory constraints. Single qubit pulses were 1.2ms 
long; CNOT gates between H and C\ (with any single 
qubit gate on C2) were 2.4ms; and CNOT gates between 
C\ and Ci (with any single qubit gate on H) were 4ms. 
These pulses are not time-optimal but have low enough 
powers for experimental implementation. Shorter pulses 
tended to require unfeasible high power levels which lead 
to probe heating during long computational sequences. 
Non-linearities in the pulse generation and transient ef- 
fects from the probe's resonant circuit lead to distor- 
tions in the implementation of shaped pulses. To avoid 
this, the r.f. field at the sample was detected through 



a pickup coil and corrected through a simple feedback 
loop. This correction procedure was only applied to in- 
dividual pulses and the longer term power inverse droop 
we observed [3(| was not corrected but should instead 
be handled by engineering robust pulses. Due to finite 
spectrometer memory we were limited to 120 gates in 
a computational sequence. Each truncation was aver- 
aged over 48 different computational gate sequences. The 
same numerically optimized r.f. selection sequence used 
in the single qubit experiment was applied before each 
experiment to the proton nuclei. Polarization on the car- 
bon nuclei was dephased with gradient techniques giving 
the starting deviation density matrix ZII (using product 
operator notation). 

A sequence of random gates was constructed in the 
following manner. The Clifford group generating set was 
chosen to be the Hadamard and PHP^ (a Hadamard gate 
conjugated by a phase gate) single qubit gates and CNOT 
gates between nearest neighbors. With a probability of 
2/3, a random single qubit gate was performed and with 
probability 1/3, a random CNOT was implemented 0. 
The resulting state was then tracked and a recovery se- 
quence to return the state to ±ZII calculated. To design 
the recovery sequence, Hadamard or PHP^ gates were ap- 
plied to each qubit such that their individual state was 
either / or Z. This state was then transformed into the 
final ZII by finding minimal amount of CNOT gates 
needed to transfer all the polarization back to the first 
qubit. The algorithm is general and efficient in the num- 
ber of qubits. These final recovery gates were not counted 
in the total number gates and will not affect the asymp- 
totic error rate. The entire sequence was then parallelized 
with a simplistic interative scheme of repeatedly checking 
whether gates in series could be compressed into a single 
parallel gate. For example, a CNOT gate between qubits 
2 and 3 followed by a Hadamard gate on qubit 1 would be 
compressed to a single timestep which implements both 
gates in parallel. The fidelity of the state was then mea- 
sured through a readout pulse on the proton spin. 

The results are shown in Figure [5j The results fit 
an exponential decay well and give an error per gate of 
4.7 ± 0.3 x 10~ 3 , approximately an order of magnitude 
larger than the single qubit results. Again, an estimate 
of the lower bound on the error rate can be obtained from 
the measured Ti's and T^'s. Assuming an independent 
and uncorrelated error model (which is unlikely but does 
not significantly affect the result) gives an average error 
per gate of 1.5 x 10 -3 . Moreover, from the design of the 
pulses, we would expect an error of 4.4 x 10 , which is 
an order of magnitude smaller than the experimentally 
measured error rate. This leads us to suspect that 
there are still errors in the implementation of the pulses 
and/or knowledge of the chemical properties of the 
molecule that are not currently handled by our pulse 
design. 
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2. Two qubit benchmarking, using the procedure de- 
scribed above can then be performed on all pairs of 
qubits. Knowing the single qubit error rates from 
the first step it should be possible to extract an es- 
timate of the two-qubit error rate. The action on 
the other n — 2 qubits should be characterized as in 
the first step to asses the fidelity of the wait steps. 



0.5- 



3. This procedure can be iterated to all groups of 3 
qubits and so on but because most fault-tolerant 
constructions are specified in terms of one and two 
qubits gates, going as far as all pairs should be 
sufficient. 



OA' 1 1 1 1 1 1 1 1 1 1 1 1 1 

10 20 30 40 50 60 70 80 90 100 110 120 130 

Number of Computational Gates 

FIG. 5: (Color online) Semi-log plot of the average fidelity 
as a function of the number of randomized gates for the 3 
qubit benchmarking experiment. The error bars are obtained 
from the statistical nature of the randomization and the fit- 
ting of the NMR spectra. To within the error bars a single 
exponential provides a good fit and gives an error per gate of 
4.7 ±0.3 x 1(T 3 . 



IV. EXTENSIONS TO THE MULTI-QUBIT 
PROTOCOLS 

More detailed information about the errors can be ob- 
tained by combining the ideas of previous randomization 
protocols [26l . [27| with the randomized computational se- 
quences. For example, one may wish to determine on 
which qubits the errors are occurring or the difference in 
error rate between one and two qubit gates. The steps of 
the proposed protocol are as follows: 

1. Perform the single qubit benchmarking procedure 
on each qubit individually. These numbers will give 
an estimate of the error per gate for single qubit 
gates. Unfortunately, as discussed above, the er- 
ror model is unlikely to follow an independent error 
model and the possibility that performing the single 
qubit gates induces errors on non-target gates needs 
to be checked. This can be achieved by measuring 
the fidelity of the identity operation on the other 
71—1 qubits. Efficient procedures exist for this mea- 
surement. For example, performing single qubit 
Clifford gates at the beginning and their inverses 
at the end of the sequence and then randomizing 
allows an estimation of the fidelity of the channel 
with a small number of experiments [26j]. A pos- 
sible concern is that the error model on cither the 
single qubit or the remaining n — 1 qubits might be 
highly non-Markovian. However, the benchmark- 
ing procedure should effectively act as a random- 
ized dynamical decoupling sequence preventing en- 
tanglement between the two sub-systems [28j . 



V. CONCLUSIONS 

We have demonstrated implementations of single and 
multi-qubit benchmarking in liquid state NMR. In both 
instances the control is still not decoherence limited 
and improvements through both hardware and software 
should be possible. Potential software improvements in- 
clude pulse more robust to calibration errors and noise in 
the pulse generation and better modeling of the system 
and apparatus. Efforts in hardware improvements will be 
focussed on ensuring the implementation of the optimal 
control pulses is as close as possible to the ideal optimal 
control pulse. 

Simulations and proofs of fault-tolerant constructions 
suggest that given certain architecture assumptions, an 
error rate of 10 -4 is sufficiently low to enable arbitrar- 
ily long quantum computations. Here, the single qubit 
experiments demonstrated close to that level of control. 
However, showing fault-tolerant levels of control on small 
demonstration systems does not imply a scalable quan- 
tum computer is possible. Indeed the benchmarking of 
the three qubit system yielded an error per gate an order 
of magnitude worse than the single qubit system. It is 
important to investigate how the level of control scales 
with the system size and how compatible the architecture 
is with the assumptions of the fault-tolerant construction 
before concluding anything about the fault-tolerance ca- 
pabilities of a given system. Multi-qubit benchmarking 
protocols will be an important part of that investigation. 
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APPENDIX A: ANALYSIS OF THE R.F. 
INHOMOGENEITY MODEL. 

As discussed in the experimental section, the main 
source of error in our liquid state NMR experiments is 
the r.f. field inhomogeneity across the sample. This er- 
ror model is not specific to NMR and can be applied to 
other systems: if a single instance of an experiment must 
be implemented multiple times, say, to reduce shot noise 
or measure the expected value of a given observable, a 
parameter of the system can be constant for a single run 
of the experiment but fluctuate from one run to another. 
In such a case, the final measurement will be related to 
an average over a distribution of that parameter. Here 
we show analytically how this error model gives a non- 
exponential decay in the single qubit experiment. 

The consequence of the r.f. distribution is that not 
all the nuclear spins in the sample will have the same 
effective rotation under a nutation field. For example, 
if we apply a r.f. field calibrated to rotate the spins by 
an angle | about the it-axis (u G {x, y,z}), the density 
matrix describing the state averaged across the sample 
in initial state p is: 

p - J deg(e)A Pu (e) (e"** ^pe^) , (Al) 

where Ap u (e) is the supcropcrator describing the error for 
the spins experiencing a field e away from the ideal field ( 
Ap u (e)(p) = e _le T p "pe 8e 4 P "), g(e) is the r.f. distribution 
and P u the appropriate rotation matrix. The error model 
arising from r.f. field inhomogeneity is an over or under- 
rotation (by an amount e) along the same axis as the 
actual rotation. With this notation the superoperator 
describing a single instance of our single qubit experiment 
is written as: 
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Ai(n) = / deg(e)A Sin (e)S in A Pin (e)P in . . . A s , 2 (e)S i2 A P , 2 (e)P l2 A 5il (e)S n A Pii (e)P il 



deg(e)A in (e)S in P in . . . A ? ; 2 (e)5 l2 P l2 A n (e)5 ll P Ii; 



(A2) 



where A Si .(e)(p) = e~ ie ^ Qi i pe ie * Qi i , A Pi .(e)(p) = 
e ~ ze 2 'j p e ze 2 v_ Aj. (e) is the cumulative error super- 
operator due to sequentially applying faulty Pi . and 5, \ } . 
Since the r.f. inhomogeneity error model is completely 
correlated with the pulses, the argument in Eq. [T] can- 
not be directly applied. Nevertheless, Emerson ct. al. 
@ conjectured that in the case of pulse dependent er- 
ror, the cumulative noise operator after a sufficiently 
long sequence will become concentrated about some av- 
erage value (which we numerically verified in certain 
situations). In the present case, the strength can be 
parametrized by the tipping angle of Aj, (e) and it can 
be easily verified that there are three relevant strengths, 
depending on whether Pj . and Si j are along parallel, anti- 



parallel or perpendicular axes, as enumerated in Table [T] 
In term of depolarizing action, Eq. IA2l is equivalent to 



Strength 


Axis 


Probability 


2 C 


Parallel 


1/8 


-e 

i 2 


Anti-parallel 


3/8 


T = 2 cos ~ x [cos (ft) cos(fe)] 


Perpendicular 


1/2 



TABLE I: Table giving the three possible strength parameters 
for the cumulative error of a tt pulse followed by ir/2 pulse due 
to r.f. inhomogeneity. Since the pulses are random and drawn 
from the set of 48 pulses described in Eq. [8] each strength 
have a different probability of occuring. 



Ai(n)= I deg{e)Pls\ n A in {e)S in P in . . 

<i><i><)\\r;s; a, u)s, p 



P} 2 S\ 2 A l2 (e)S l2 P l2 Pi SI A n ( e ) S h P n 



t ct 



(A3) 



The key observation to make is that each subset of 
S and P yielding a given noise strength parameter is 
sufficient to depolarize that given noise, e.g. 



1 

where X 3tt 

2 ' 

and A„„,„ a 



(A4) 



the cummulative noise of strength ^fe with depolarizing 
parameter 



4 cos 2 (^f e) - 1 



(A5) 



{SP 



Once the randomization over different gate sequences 
is performed, each Aj . in Eq. IA3I will be randomized to 
S and P are pulses along parallel axis} a channel given by a weighted sum of the three different 



is the depolarized channel associated with depolarizing channel, i.e. 
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i r n 

r " 1 

1=1 ii 



deg(e) 



" 2 p,sex 3 „. 



deg(e) 
J deg{e)K ve . 



2 P,S£X* e 

3 - 1, 1, 

g^««,ff + g^-aue^e + 2 a " e T 



P,sex r 



(A6) 



r 



Therefore, the effective averaged channel action is given 
by 



Kve(p) =Pp+(l-p) 



1 



3 1 
P = —P?-f H — Pi* c 



(A7) 



The gate fidelity obtained by numerically integrating 
Eq. IA6I using the measured r.f. distribution is compared 
to numerical simulations of the experimental sequences 
under the measured r.f. distribution in Fig. [51 which 



clearly demonstrate the non-exponential behavior of the 
decay. 
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FIG. 6: Numerical simulation and analytical prediction of 
the fidelity decay of the randomized benchmarking protocol 
under a r.f. inhomogeneity error model plotted on a semi-log 
plot. The agreement of the two curves demonstrate the error 
model is well understood. The small discrepancy is due to the 
finite number of runs in the numerical simulations. This curve 
decays faster than that in Figure [2] because this analysis uses 
simple pulses whereas the experiment used robust composite 
pulses. 



